Clustering Learning Objects Collections Using Cluster Ensembles

نویسندگان

  • Hanan Ayad
  • Mohamed Kamel
چکیده

Learning Object Repositories are increasingly being used in learning systems to provide high-quality, reusable educational materials. A relevant data mining problem associated with the automatic categorization of learning objects is the discovery of intrinsic classes based on the textual contents of the meta-data records. In this paper, we present a cluster ensemble method, that is applicable for distributed clustering of learning objects, and for overcoming issues of large-sized collections and very high dimensionality of the vocabulary space. First, an initial significant reduction of the vocabulary space is proposed. Subsequently, a cluster ensemble method, consisting of three stages is proposed. Base clusterings for multiple random subsets of the data are generated, followed by the application of an adaptive voting algorithm for resolving the cluster label mapping problem. Finally, a consensus clustering is extracted by merging similar clusters that may be produced by fineresolution base clusterings, and maximum likelihood principle is applied on a resulting assignment probability matrix. The experimental analysis shows improvement in the consensus clustering quality over the base clusterings, and a competitive performance compared to clustering the original collection, as measured using normalized mutual information. The analysis validates the applicability of the proposed ensemble method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data as Ensembles of Records: Representation and Comparison

Many collections of data do not come packaged in a form amenable to the ready application of machine learning techniques. Nevertheless, there has been only limited research on the problem of preparing raw data for learning, perhaps because widespread differences between domains make generalization difficult. This paper focuses on one common class of raw data, in which the entities of interest a...

متن کامل

A Link-Based Cluster Collection Approach Combined Contagious Cluster With For Categorical Data Clustering

Data clustering is a challenging task in data mining technique. Various clustering algorithms are developed to cluster or categorize the datasets. Many algorithms are used to cluster the categorical data. Some algorithms cannot be directly applied for clustering of categorical data. Several attempts have been made to solve the problem of clustering categorical data via cluster ensembles. But th...

متن کامل

A Survey of Consensus Clustering

This chapter describes the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determine these partitionings – popularly known as the problem of “consensus clustering”. We illustrate different algorithms for solving the consensus clustering problem. The notion of dissimilarity between a pair of c...

متن کامل

A CLUE for CLUster Ensembles

Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package ̃clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including...

متن کامل

Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions

This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant ‘knowledge reuse’ framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006